On Reward Function for Survival
نویسنده
چکیده
Obtaining a survival strategy (policy) is one of the fundamental problems of biological agents. In this paper, we generalize the formulation of previous research related to the survival of an agent and we formulate the survival problem as a maximization of the multi-step survival probability in future time steps. We introduce a method for converting the maximization of multi-step survival probability into a classical reinforcement learning problem. Using this conversion, the reward function (negative temporal cost function) is expressed as the log of the temporal survival probability. And we show that the objective function of the reinforcement learning in this sense is proportional to the variational lower bound of the original problem. Finally, We empirically demonstrate that the agent learns survival behavior by using the reward function introduced in this paper.
منابع مشابه
Investigating the Theory of Survival Analysis in Credit Risk Management of Facility Receivers: A Case Study on Tose'e Ta'avon Bank of Guilan Province
Nowadays, one of the most important topics in risk management of banks, financial, and credit institutions is credit risk management. In this research, the researchers used survival analytic methods for credit risk modeling in terms of the conditional distribution function of default time. As a practical task, the authors considered the reward credit portfolio of Tose'e Ta'avon Bank of Guilan P...
متن کاملCommentary: New View on Treatment of Drug Dependence
In the 1960s, discovery of pleasure system (defined as reward system) in the brain that may underlie drug reward and addiction encouraged many scientists to investigate the mechanisms by which drug abuse affects central nervous system function. In this regard, investigators developed several drugs targeting the brain reward system for drug dependence therapy. However, no positive results obtain...
متن کاملDifferential Aspects of Natural and Morphine Reward-related Behaviors in Conditioned Place Preference Paradigm
Introduction: Natural rewards are essential for survival. However, drug-seeking behaviors can be maladaptive and endanger survival. The present study was conducted to enhance our understanding of how animals respond to food and morphine as natural and drug rewards, respectively, in a conditioned place preference (CPP) paradigm. Methods: We designed a protocol to induce food CPP and compare it ...
متن کاملEstimation of the Survival Function for Negatively Dependent Random Variables
Let be a stationary sequence of pair wise negative quadrant dependent random variables with survival function {,1}nXn?F(x)=P[X>x]. The empirical survival function ()nFx based on 12,,...,nXXX is proposed as an estimator for ()nFx. Strong consistency and point wise as well as uniform of ()nFx are discussed
متن کاملInvestigating the Relationship between Organizational Rewards and Employee Engagement (Case Study of Foolad Derakhshan Company in Arak)
Due to the importance of employee engagement in the last decade and the significant role which plays in concept of organizational efficiency, identifying the factors influencing this concept can help any organization to grow and improve and also maintain and guarantee the survival of organizations in today’s competitive conditions. One of the factors influencing this concept is organizational r...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1606.05767 شماره
صفحات -
تاریخ انتشار 2016